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Abstract: This paper investigates whether remote sensing evapotranspiration estimates can 
be integrated by means of data assimilation into a distributed hydrological model for 
improving the predictions of spatial water distribution over a large river basin with an area 
of 317,800 km . A series of available MODIS satellite images over the Haihe River basin 
in China are used for the year 2005. Evapotranspiration is retrieved from these lxl km 
resolution images using the SEBS (Surface Energy Balance System) algorithm. The 
physically-based distributed model WEP-L (Water and Energy transfer Process in Large 
river basins) is used to compute the water balance of the Haihe River basin in the same 
year. Comparison between model-derived and remote sensing retrieval basin-averaged 
evapotranspiration estimates shows a good piecewise linear relationship, but their spatial 
distribution within the Haihe basin is different. The remote sensing derived 
evapotranspiration shows variability at finer scales. An extended Kalman filter (EKE) data 
assimilation algorithm, suitable for non-linear problems, is used. Assimilation results 
indicate that remote sensing observations have a potentially important role in providing 
spatial information to the assimilation system for the spatially optical hydrological 
parameterization of the model. This is especially important for large basins, such as the 



Sensors 2008, 8 



4442 



Haihe River basin in this study. Combining and integrating the capabilities of and 
information from model simulation and remote sensing techniques may provide the best 
spatial and temporal characteristics for hydrological states/fluxes, and would be both 
appealing and necessary for improving our knowledge of fundamental hydrological 
processes and for addressing important water resource management problems. 

Keywords: Evapotranspiration; Distributed hydrological model; Data assimilation; WEP; 
SEBS; Extended Kalman Filter. 



1. Introduction 

Because water is becoming the limiting factor for development in many parts of the world, more 
systematic approaches are needed to analyze the uses, depletion, and productivity of water. An 
improved knowledge of the land surface hydrologic states and fluxes, and of their spatial and temporal 
variability across different scales, is urgently needed in many hydrologic studies and water resources 
management [1-2]. At present many tools can help for water budget analysis, including distributed 
models, geographic information systems (GIS) and remote sensing techniques. 

A great variety of distributed hydrological models have been developed, ranging from simple 
empirical equations, to complex systems of partial differential equations, which can incorporate the 
spatial distribution of various inputs and boundary conditions, such as topography, vegetation, land 
use, soil characteristics, rainfall, and evaporation, and produce spatially detailed outputs such as soil 
moisture, water table, groundwater fluxes, and surface saturation patterns. However, distributed 
modeling of hydrological processes has its limitations. The major problems are over-parameterization 
and uncertainty, in the sense that most models have not been validated in all their details. New data 
sources for observation of hydrological processes can alleviate some of the problems facing the 
validation and operational use of hydrological models. Satellite, airborne and ground-based remote 
sensing has begun to fulfill some of its potential for hydrological applications, allowing monitoring and 
measurement of rainfall, snow, soil moisture, vegetation, surface temperature, energy fluxes, and land 
cover over large areas. The main reason is that remote sensing data can provide large-scale, systematic 
land surface observations consistently over the large scale [3]. 

The integration of data and models is referred to as data assimilations (DA) which provides a means 
of integrating data in a consistent manner with model predictions and merge measurements of different 
types, accuracies, and resolutions into spatially distributed models [4-5]. For example, remotely sensed 
observations and land surface modeling have been integrated in both NASA's Global Water and 
Energy Cycle (GWEC) program, and the World Climate Research Cycle (GWEC) program, and the 
World Climate Research Programme's (WCRP) Global Energy and Water Experiment (GEWEX). 

The fundamental operative unit for water resources management is the catchment or river basin. 
Combining and integrating the capabilities and information within an integrated framework from 
simulation and remote sensing techniques is thus both appealing and necessary for improving our 
knowledge of fundamental hydrological processes and for supporting water resources management in 
the catchment or basin scale. Therefore, it's urgent to investigate appropriate DA and modeling 
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approaches at the catchment/river basin scale, and more case studies should be conducted to realize the 
operational potential of DA for a broad range of water resources management problems. Therefore, the 
work presented in this paper is motivated by the need to develop an improved data assimilation system 
that use remote sensed ET to improve the predictive performance of a distributed hydrological model 
in a large river basin. 

Many studies in hydrological modeling of DA have begun to appear in recent years [e.g. 6-22], 
spurred by the success of DA in other fields. Most of these studies focus on the assimilation of soil 
moisture data in land surface models, and rely on synthetic datasets to assess the performance of the 
DA algorithms. Much less work has been published on assimilating remote sensed evapotranspiration 
(ET)/latent flux (LE) into hydrological models at the regional/basin scale. Schuurmans et al. [23] 
address the question of whether remotely sensed latent heat flux estimates from Surface Energy 
Balance Algorithm for Land (SEBAL) over a catchment can be used to improve distributed 
hydrological model computations using a constant gain Kalman filter data assimilation algorithm in the 
Drentse Aa catchment with an drainage area of 300 km . Pan et al. [3] proposed and tested a data 
assimilation system that consisted of a combination of two filters - a particle filter (PF) [24-25] and an 
ensemble Kalman filter (EnKF) [26-27] to estimate the water budget using a MODIS based estimate of 
surface evapotranspiration (ET) over the spatial domain of Red- Arkansas river basin. 

2. A data assimilation system for water budget predictions 

2.1 Overview 

The scheme of the data assimilation system is sketched in Figure 1. In this paper a series of MODIS 
satellite images available for the Haihe basin for the year 2005 are used. Evapotraspiration is retrieved 
from these lxl km resolution images using the SEBS (Surface Energy Balance System) algorithm [28]. 
The physically-based distributed model WEP-L (Water and Energy transfer Process in Large river 
basins) [29] is used to compute the water balance of the Haihe River basin in the same year. An 
extended Kalman filter (EKF) data assimilation algorithm, suitable for non-linear problems, is used. 

In order to attain the water budgets for water resources assessment and planning, the specific setup 
of the WEP-L model is made to provide an accurate estimate of the magnitude of the different 
components of the hydrologic cycle in large basins like Haihe River basin. ET is an important 
component for water budget analysis, because ET can be regard as the net consumption of water. 
Remote sensing algorithm (e.g. SEBS) can provide spatially distributed estimates of evapotranspiration, 
although continuous time series with time steps are difficult. It is expected that the WEP-L model can 
benefit by assimilating the spatial distributed ET estimates provided by the SEBS, and give a better 
understanding about how the availability of actual evapotranspiration varies both spatially and 
temporally. The physical models, remote sensing retrieval tool, data assimilation techniques and data 
sources are further discussed below. 
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Figurel. Data assimilation system scheme 
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2.2 Description of the WEP-L model 

With the computational resources available today to most modelers, it has become feasible to build 
and apply highly complex distributed hydrological models that represent many different processes and 
consist of many model elements. The distributed hydrological model WEP-L was developed in a 
national key basic research project of China [29-31]. The WEP-L model is based on the WEP model 
[32-34] which has been successfully applied in several watersheds in Japan, Korean and China with 
different climate and geographic conditions [32, 34-39]. The WEP-L model adopts the contour bands 
as the calculation units to fit for large river basins and has been applied in the Yellow river basin in 
China. For details one is referred to Jia et al. [29-31]. 

The vertical structure of WEP-L within a contour band is shown in Figure 2(a), and the horizontal 
structure of WEP-L within a sub-watershed is shown in Figure 2(b). Land use is divided into five 
groups within a contour band, namely Soil Vegetation (SV) group, Non-irrigated farmland (NF) group, 
Irrigated Farmland (IF) group, Water Body (WB) group and Impervious Area (IA) group. The SV 
group is further classified into bare soil land, tall vegetation (forest or urban trees) and short vegetation 
(grassland). The IA group consists of impervious urban cover, urban canopy and rocky mountain. The 
areal average of water and heat fluxes from all land uses in a contour band produces the averaged 
fluxes in the contour band. For pervious groups of SV, NF and IF, nine vertical layers, namely an 
interception layer, a depression layer, three upper soil layers, a transition layer, an unconfined aquifer, 
an aquitard and a confined aquifer, are included in the model structure. 
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Figure 2. Schematic illustration of WEP-L model structure (Jia et al., 2006): (a) vertical 
structure within a contour band, and (b) horizontal structure within a sub-watershed. 
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The simulated hydrological processes include snow melting, evapotranspiration, infiltration, surface 
runoff, subsurface runoff, groundwater flow, overland flow, river flow, and water use. The simulated 
energy transfer processes include short-wave radiation, long-wave radiation, latent heat flux, sensible 
heat flux, and soil heat flux. Adopted modeling approaches for hydrological and energy processes are 
referred to in Jia et al. [34] (200 lb) except snow melting and water use processes. 

WEP-L integrates the merits of distributed hydrological models with those of SVATS (Soil- 
Vegetation-Atmosphere Transfer Schemes) model, couples the simulation of water cycle and energy 
processes, and calculates evapotranspiration from each land use separately. Evapotranspiration consists 
of interception of vegetation canopies (evaporation from the wet part of leaves), evaporation from 
water bodies, soil, urban cover and urban canopy and transpiration from the dry fraction of leaves with 
the source from the three upper soil layers. The evaporation from water bodies or ponded water in the 
depression storage is calculated with the Penman equation. The evaporation from the impervious area 
is taken as the smaller one of current depression storage and the potential evaporation. The 
computation of interception is referred to the Noilhan and Planton [40] model that is an interception 
reservoir method. The evaporation from soil is assumed to come only from the topsoil layer. The 
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Penman equation is adopted to compute potential evaporation from which actual evaporation of soil is 
computed using a wetness function suggested by Lee and Pielke [41]. The actual transpiration is 
calculated using the Penman-Monteith equation [42] and the canopy resistance [40] which is related to 
the soil moisture condition. The average evapotranspiration in a contour band is obtained by areally 
averaging those from each land use. 

2.3 Description of the SEBS algorithm 

With the rapid development and increased application of remote sensing technology, 
evapotranspiration calculation methods using remote sensing techniques have become a major trend for 
hydrological research in recent years. The data obtained from visible, near-infrared and thermal band 
can reflect the spatial and temporal distribution of surface features, which have a great importance in 
simulating the energy balance. The SEBAL (Surface Energy Balance Algorithm for Land) proposed by 
Bastiaanssen et al. [43], which is based on land surface parameters acquired with remote sensing 
techniques. It is a semi-empirical model applied in calculating the evapotranspiration and the main 
difficulty is to determine the "hot" pixel and the "cold" pixel which has a great effect on the final 
results. Norman and Kustas [44] proposed the TSEB (Two-Source Energy Balance) algorithm to 
calculate the evaporation from bare soil and transpiration from vegetation separately based on remote 
sensing data. 

The Surface Energy Balance System (SEBS) was developed by Su [28] to estimate the atmospheric 
turbulent fluxes and evaporative fraction using satellite earth observation data, in combination with 
meteorological information at proper scales. The system retrieves evapotranspiration (ET) using 
measurements of incoming surface radiation, surface skin temperature, surface meteorology, and 
surface and vegetation properties [45]. One of the advantages in this algorithm is applying both Bulk 
Atmospheric Similarity (BAS) and the Monin-Obukov atmospheric surface layer (ASL) similarity in 
the model, which can be used for regional and local scales respectively to determine the turbulent 
fluxes. Another important merit of SEBS is the inclusion of a physical model which takes surface 
heterogeneity into account in the estimation of the roughness height for heat transfer. The SEBS 
algorithm has been successfully applied for many applications of evaporation estimations in many 
different places with different scale [46-51]. 

For the purpose of this study, only basic equations for ET retrieval will be briefly described, full 
details are given by Su [28]. The surface energy balance is commonly written as 

R n =G 0 + H + AE (1) 

Where R n is the net radiation, G 0 is the soil heat flux, H is the turbulent sensible heat flux, and 

XE is the turbulent latent heat flux ( X is the latent heat of vaporization and E is the actual 
evapotranspiration). 

To determine the evaporative fraction (to be defined below), use is made of energy balance 
considerations at limiting cases. Under the dry- limit, the latent heat (or the evaporation) becomes zero 
due to the limitation of soil moisture and the sensible heat flux is at its maximum value. 

From Eqn. (1), it follows, 
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AE dry =R n -G 0 -H dry =0, or 



H dry - R n~ G 0 



(2) 



Under the wet-limit, where the evaporation takes place at potential rate, AE wet , (i.e. the evaporation 

is limited only by the energy available under the given surface and atmospheric conditions), the 
sensible heat flux takes its minimum value, H wet , i.e. 

AE wet =K-G 0 -H wet , or 

(3) 

The relative evaporation then can be evaluated as 

A _1 H ~ H wet 

11 dry 11 wet 



The evaporative fraction is finally given by: 

A= XE _A r -ZE wel 



R-G i? 



(5) 



By inverting Eqn. (5), the actual latent heat flux XE can be obtained. 

The actual sensible heat flux H is determined with the bulk atmospheric similarity approach and is 
constrained in the range set by the sensible heat flux at the wet limit H wet , and that at the dry limit 

H dry . H dry is given by Eqn. (2) and H wet can be derived by a combination equation [52] similar to the 

Penman-Monteith combination equation with the assumption of a completely wet situation. Other 
details are given in Su [28]. 

2.4 The extended Kalman filter 

Data assimilation techniques can be used to improve model performance either by optimizing model 
parameters or by correcting the state produced by the model, both through a variational or a sequential 
approach. For operational purposes the combination of parameters optimization and state estimation is 
most promising. Nevertheless, in this paper we restrain ourselves to state estimation via a sequential 
approach, which provides a general framework to explicitly incorporate input, model and remote 
sensing observation errors. Among sequential data assimilation techniques, the Kalman filter (KF) 
introduced by Kalman [53] and originally devised for linear models, is the most widely used method. 
In order to perform DA on non-linear models, several different implementations of the KF have been 
devised: the Extended version (EKF) proposed by Jaswinski [54], the Ensemble version (EnKF) 
proposed by Evensen [27] and the Singular Evolutive Extended version (SEEK) proposed by Pham et 
al. [55]. 

A significant benefit of the Kalman filter is that unmeasured process states may be estimated based 
on limited, noisy process measurements. In this paper, the EKF is used for nonlinear systems that are 
linearized around the current process state, which take into account the estimated errors on the model 
and on the observations to determine the correction to be applied to the states and calculates a new 
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estimation of these errors every time step. Thus, the more accurate the observations the closer the a 
posteriori state of the model will be to them. 

In order to use EKF to perform parameter estimation, parameters are treated as process variables 



with a rate of change=0. We can define the state vector 



and observation vector z: 







r f{x,e,tf 






v 0 J 



+ w(t), (6) 

z = h(x,t) + v(t) (7) 

The evolution of the process state, i, is a function of the state, x, time, t, and system parameters, 
6 . w and v are subject to zero mean Gaussian white noise functions. Their error covariance matrices 
represent the errors: Q k for the uncertainties on the model and R k for the uncertainties on the 

observations. Q k represents the errors generated by the model during one time step, and does not 

account for the total uncertainties of the model, which also include the propagation of uncertainties 
from k - 1 to time k . Therefore, a matrix P k is defined to include the propagation of the former errors. 

The heart of the Kalman filter is the gain matrix, K , which is calculated in two phases: the 
adjustment phase and the propagation phase. 

During the first phase, the Kalman gain matrix K k is derived by the following formula: 

K k =P k -H T k {x- k )[H k {x- k )P-H T k {x- k ) + R k r (8) 

Here H is the Jacobian matrix of h with respect to the observation vector. Eqn. (8) indicates that, 
for each state, the correction factor will be applied due to each observation and its calculation takes 
into account the error matrices R k and P k . 

Then the state vector is updated: the a posteriori state vector x k is equal to the a priori state vector 
x k plus the difference between the observation and state vectors multiplied by the Kalman gain: 



x k =x;+K k [z k -h k (x;)] (9) 

The a posteriori error covariance matrix P k is also updated as follows: 

P k =V-K k H k (x- k )]P k - (10) 

During the second phase, the new error covariance matrix P k+i is calculated by adding the error at 
time k P k propagated via the tangent linear and the error Q k generated during this time step, where 
F is the Jacobian matrix of / with respect to the state vector. 

P k+l = F(x k -)P k F T (x k ) + Q k (11) 

Calculation of the correction factor (Kalman gain) is an important step of the method: if this factor 
is too small, the assimilation procedure will have no effect, if it is too large, the model would forget its 
original evolution. The magnitude of the correction is dominated by the ratio of the errors on the 
observations and the model. If the uncertainties on the observations are very small (i.e. \R k I P k \ ~ 0), 

the Kalman gain will be almost equal to 1 and the states will set to the observations. Alternatively, if 
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the observations have a large uncertainty, the Kalman gain will be almost equal to 0 and the correction 
will be almost null. The above expressions show that, if Q k is assumed to be zero, the magnitude of 

the error covariance will decrease and close to a value of zero. This would be true, only if the model is 
a perfect representation of reality, and the observations are perfect. However, we know that these 
assumptions cannot be made. So a certain level of uncertainties must be retained in the data 
assimilation process. A methodology to determine the optimal value for R k and Q k should be the 

focus of a further study, but falls outside the scope of this paper. 

3. Description of study area 

The Haihe River basin is located between 35°~43°N and 112°~120E°. It neighbors the Inner 
Mongolian Plateau in the north, and the Yellow River is the borderline in the south. It faces the Bohai 
Sea to the east and borders Shanxi Plateau in the west. The Haihe basin belongs to the warm temperate 
zone with a semi-humid and semi-arid climate. The winters are dry and cold, with low rainfall in the 
spring and heavy rainfall in the summer. The average annual precipitation is 548 mm, about 80 percent 
of which falls during June to September. Its area is 317,800 square kilometers, of which 189,000 km 
is mountainous and the remainder is plain, and divided into 15 level-3 water resources areas (WRA3) 
(Figure 3). The Haihe River valley starts from the western Taihang Mountain and reaches the eastern 
Bohai Sea, running across Beijing City, Tianjin City, Hebei Province, Shanxi Province, Shandong 
Province, Henan Province, Liaoning Province and the Inner Mongolia Autonomous Region. 



Figure 3. Map of Haihe River basin and name list of WRA3s. 
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The main land use patterns in the Haihe River basin are dominated by dryland, forest land, 
shrubbery, grassland, paddy field, wetland, build-up area, bare soil, water body and bottomland (Figure 
4). 



Figure 4. Land use patterns in the Haihe basin. 
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4. Data preparation and assimilation results, discussion 

4.1 WEP-L application to the Haihe basin 

The whole drainage basin of the Haihe basin in WEP-L model is divided into 3067 sub-watersheds 
(Figure 5(a)), each of which is assigned with a Pfafstetter code [56]. Each sub-watershed in hilly and 
tableland areas is further divided into 1-10 contour bands, but no further division is performed for sub- 
watersheds in plain areas because of little topographic effects, i.e., one sub-watershed is taken as one 
contour band. According to the contour band, the whole Haihe basin is further discretized into 11,752 
hydrologic response units (HRU) (Figure 5(b)). The details of the basin subdivision and coding are 
described in Luo et al. [57]. 
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Figure 5. Subdivision of hydrological response units: (a) Subdivision of sub-basins; and 
(b) Subdivision of contour bands in mountainous area 




Table 1 shows a list of the main collected basic data on which the WEP-L input data are based. The 
data include the following categories: (1) hydro-meteorology; (2) land cover information including 
land use, vegetation, soil and water conservation, crop patterns, etc.; (3) topography, soil and 
hydrogeology; and (4) river network; etc. For data preparation and processing, it is referred to Jia et al. 
[29]. 

Model parameterization and sensitivity analysis: There are four categories of parameters in the 
WEP-L model: (1) parameters of land surface and river channel system; (2) parameters of vegetation; 
(3) parameters of soil and aquifer; and (4) parameters of snow melting. All of parameters are initially 
estimated according to land cover information, observation data, and remote sensing data. An 
explanation of estimation about some main parameters of these four categories referred to Jia et al. 
[29]. The disturbance analysis, a simple common method of sensitivity analysis, is used to analyze the 
sensitivities of main parameters and input data of the WEP-L model on the annual averages of model 
outputs. This method is that, when the model computes, one of the system parameter has an exiguity of 
change, while the other parameters are kept unchanged. Then the ratio of output change rate to the 
parameter change rate can be got, called as the sensitivity. The sensitivity analysis results indicates that 
the high sensitive include maximum depression storage of land surface, maximum soil moisture 
content (soil porosity), conductivity of river bed materials as well as thickness of soil layers. The 
middle sensitive include the conductivity and storage coefficient of aquifers, root depth and saturated 
conductivity of soils, etc. The low sensitive include vegetation parameters, manning roughness of river 
and lateral section shapes of river, etc. 
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Table 1. List of collected basic data on which the model input based 



Category Item Content 



Meteorological 


Daily rain/snow 


Data of 1502 rain stations, and 47 meteorological stations 


and hydrology 




from 1956 to 2005 




Hourly rain/snow 


Data of 47 rain stations from 1956 to 2005 




Wind speed 


Daily data of 47 meteorological stations from 1956 to 2005 




Air temperature 


Daily data of 47 meteorological stations from 1956 to 2005 




Sunshine hours 


Daily data of 47 meteorological stations from 1956 to 2005 




Humidity 


Daily data of 47 meteorological stations from 1956 to 2005 




Monthly runoff 


Data of 23 hydrologic stations from 1956 to 2005 


Remote sensing 


Landsat TM and 


1:100,000 map in 1986, 1996 and 2000 


and land use 


deduces land use 






NOAA-AVHRR 


Monthly data between 1982 and 2000 




GMS 


Monthly data between 1998 and 2002 




MODIS 


Monthly data between 2002 and 2005 


Vegetation 


Vegetation fractional 


Deduced from NOAA-AVHRR from 1982 to2000 and 




coverage 


MODIS from 2001 to 2005 




Leaf area index 


Deduced from NOAA-AVHRR from 1982 to2000 and 
MODIS from 2001 to 2005 




Crop patterns 


Data of 3rd level WRA districts in 1980, 1990 and 2000 


Topography, 


Topography 


USGS GTOPO30 (1km by 1km DEM) 


soil and 


Soil 


1:1000,000 and 1:100,000 soil classification maps in China 


geohydrology 


Geohydrology 


Parameters of geohydrology, distribution of lithology and 
thickness of aquifers 


River network 


River network 


River network map 


Water use 


Reservoir operation 


Reservoir operation information of 44 reservoirs 




Water use in irrigation 


Water use data in 75 irrigation area larger than 100 thousand 




areas 


mu 




Water use in 


Monthly water use data at the county level from 1956 to 2005 




administrative areas 






Water diversion 


Water diversion processes in representative districts 



Model calibration and validation: Sensitivity of parameters is analyzed, and then parameters with 
high sensitivity are calibrated on a basis of "try and error". 11 years (1990-2000) is selected as 
calibration period. The calibration parameters include maximum depression storage depth of land 
surface, soil saturated hydraulic conductivity, hydraulic conductivity of unconfined aquifer, 
permeability of riverbed material, Manning roughness, snow melting coefficient, and critical air 
temperature for snow melting. After the model calibration, all parameters are kept unchanged, 
continuous simulations from 1980 to 2000 are performed to verify the model by using observed 
monthly discharges at 23 main gage stations in the basin. Simulation results of model indicate that 
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average errors of annually runoff are less than 10%, Nash-Sutcliffe efficiency of monthly runoff at 
main gage stations is over 60%, and correlation coefficients between simulated and observed monthly 
runoff exceed 80%. A validation example at four stations is shown in Figure 6. Nash-Sutcliffe 
efficiency and multi-yearly errors of simulated monthly runoff at Guangtai, Huangbizhuang, Chengde 
and Daiying station are 0.4 and 6.5%, 0.68 and 5.3%, 0.72 and -0.6%, 0.66 and -3.0%, respectively. 

Figure 6. Validation of simulated monthly discharges at (a) Guantai station, (b) 
Huangbizhuang station, (c) Chengde station, and (d) Daiying station. 
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4.2 SEBS application to the Haihe basin 

In order to use SEBS algorithms, primary inputs required in this study are basically two folds: (1) 
Meteorological variables, such as, wind speed, air temperature, surface pressure, humidity and solar 
radiation; and (2) Remote sensing physical parameters derived from MODIS (Moderate-resolution 
Imaging Spectroradiometer) data onboard the Terra platform. The relevant parameters for this study are 
given in Table 2. 



Table 2. Input parameters for SEBS. 



Parameters 


Estimation from 


Surface temperature (°C) 


MODIS derivative 


Surface albedo (-) 


MODIS derivative 


NDVI (-) 


MODIS derivative 


DEM 


MODIS derivative 


PBL depth (m) 


1000 


Air temperature (°C) 


50 meteorological stations 


Relative humidity (kg/kg) 


50 meteorological stations 


Wind speed (m s" 1 ) 


50 meteorological stations 


Surface pressure (Pa) 


50 meteorological stations 



In this study the radiation, surface temperature, and surface vegetation properties are estimated from 
MODIS-Terra products [45, 51] for a detailed description of the input sources.. The MODIS sensors 
have a temporal resolution of a spatial resolution of lxl km. It is chosen because of its short revisit 
period and therefore higher chances to obtain cloud free images in the Haihe basin. The frequency of 
the MODIS-Terra data availability is once a day if it is cloud free, and the time of the satellite passing 
over the study area is around 1 1:00 a.m. local time. During the year 2005, totally 20 observations were 
available for the study area. Data preparation and processing are referred to in Shan [58]. 

The meteorological datasets were collected from the Chinese National Meteorological Center 
(NMC), which includes the standard meteorological observations over 50 meteorological stations in 
the Haihe River Basin on daily basis. Among these meteorological observations, relative humidity, 
wind speed, air temperature are measured on a hourly basis and are recorded every 6 hours at 2:00, 
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8:00, 14:00 and 20:00, while precipitation and sunshine hours are stored as daily values. A linear 
interpolation model is used to obtain the meteorological items at the satellite over passing time. The 
data is interpolated by ILWIS software to obtain the spatial meteorological items for the whole Haihe 
river basin. 

SEBS first computes the net radiation flux using MODIS-L1B products and then estimates the 
ground heat flux following the empirical approach of Monteith [42] and Kustas and Daughtry [59]. 
Remote sensed estimates of surface thermal infrared emissivity and the reflectance of red and near 
infrared bands are used to compute spatial variations in reflected short-wave and emitted long-wave 
radiation away from the land surface. A combination of the short and long-wave radiation yields the 
possibility to compute the net radiation absorbed at the surface for every pixel. It is a key step to 
compute the sensible heat flux using turbulent heat equations by considering both the effective 
roughness height and aerodynamic resistance of the surface layer as well as the atmospheric stability of 
the boundary layer. For the latter, SEBS solves the similarity stability equations on momentum and 
potential temperature [45]. The latent heat flux, or equivalently the evapotranspiration, is then taken as 
the residual of the surface energy budget found by substracting the ground heat flux and sensible heat 
flux from the net radiation. Figure 7(a) shows an example of the distributions of daily actual 
evapotranspiration (ETa) calculated by SEBS for September 17, 2005. 

Figure 7. ETa distribution in September 17, 2005 as retrieved by the SEBS algorithm: 
(a) lxl km grid mode; and (b) Converted into 1 1752 HRUs. 




4.3 Results and discussion of DA 

The boundaries of the HRUs in WEP-L often differ from the boundaries of the satellite grids. In this 
study, pixel-to-pixel SEBS estimates are lxl km grid mode, whereas WEP-L predictions are given 
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based on the 1 1752 HRUs. SEBS estimates are aggregated into 1 1752 HRUs (see Figure 7(b)) in order 
to make the data series matching for data assimilation. Although some spatial variability information 
could be missed in the process of data format conversion, SEBS ET distribution remains more spatially 
variable than model derived ET distribution. To update the WEP-L model simulated evapotranspiration, 
the model predicted actual evapotranspiration for the day with satellite coverage is used. Some obvious 
spatial differences between the simulated ET and remote sensing ET are found in Figs. 7(b) and 10 (a). 
This confirms the gap between the WEP-L model and SEBS retrievals. Although there is no definitive 
evidence to verify that the SEBS algorithm can provide better ET estimation than the WEP-L model, 
some studies on WEP-L show that its performance can be impacted due to the interferences of water 
diversions and reservoir regulations. This impact would be more obvious in the Haihe River basin, 
because over than 90% of water resources have been exploited in this region. ET simulation would be 
highly relative to water use information. So it would be difficult to attain the precise daily ET value 
with good spatial characteristics due to the difficulty to acquire the detailed water use process data. 
This would suggest that the assimilation of SEBS ET has the potential to improve the ET estimation. 

Different assimilation techniques existing correct the inputs, the internal states or the outputs of the 
models. The assimilation method used in our approach is a sequential method: the output states (ET) of 
the model are corrected using a weight-adaptive optimal interpolation algorithm based on the extended 
Kalman filter mentioned above. This methodology consists of locally correcting the value of the actual 
evapotranspiration of the WEP-L model when an observation is available. Figure 8 shows a time series 
plot of simulated ET with and without DA in a point HRU No. 610 (Sub-basin No. 327, Contour band 
No.05). At the time step k , an observation is available. The difference between the observed (SEBS) 
and a priori simulated (WEP-L) values is partially corrected and an a posteriori value of the state, 
closer to the observed value is obtained. Then the WEP-L model continues and evolves freely until 
new observations are available. In order to avoid the water balance errors when correcting the states 
produced by the model, the assimilated ET would be constrained equal to the available water in the 
model (force mode) if the attained ET is still higher than the available water in the model after data 
assimilation. 

Figure 8. Time series plot of simulated ET with and without DA in a point HRU No. 610. 
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A comparison between the area averaged daily actual evapotranspiration in the case of available 
MODIS spectral data by the WEP-L model and the SEBS algorithm is given in Figure 9. As can be 
seen from the figure, there is a good correlation (R2=0.60) between the daily ET estimated by SEBS 
and WEP-L. Although the basin averaged daily evapotranspiration show a good piecewise linear 
relationship between SEBS retrieval and WEP-L estimates in the basin level, their spatial distribution 
within the Haihe basin is different. Analyzing the SEBS and WEP-L results for 2005 (e.g., Figure 7(a) 
and Figure 10(a)), it is found that the remote sensing results contain more information about the spatial 
variability of the evapotranspiration estimates. Therefore, it is expected that above data assimilation 
system could lead to an improvement of the spatial water balance analysis using the WEP-L model. 

Figure 9. Scatter plot of the area averaged daily actual evapotranspiration in the case of 
available spectral data calculated by WEP-L and SEBS [mm]. The dashed line 
represents the 1 to 1 line, and the solid line the relationship between the data points from 
the WEP-L and SEBS. 




o.o + , , , , 

0.0 1.0 2.0 3.0 4.0 

ETa SEBS 

Table 3 summarizes the results of the SEBS, WEP-L and assimilation respectively when averaged at 
the basin scale for the available day. Their spatial man, standard deviation and RMSE are given in this 
table. This RMSE between the observations and simulations is calculated as: 



RMSE - 




n t is the number of data points, 6 0 the observed (SEBS) actual evapotranspiration, and 0 s the 

simulated actual evapotranspiration. The RMSE can offer a measure of the "potential" from data 
assimilation. If the RMSE simulation with DA is lower than that of simulation without DA, this would 
means that the assimilation impact is positive. By the comparison of the RMSE of WEP-L derived 
actual evapotranspiration with and without DA, the conclusion is that data assimilation in this study 
leads to a positive impact on the WEP-L model. However, one must keep in mind that this is not 
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sufficient for a definitive conclusion. This is because the SEBS derived actual evapotranspiration also 
contains errors. 



Table 3. Comparison of spatial mean, standard deviation and RMSE of SEBS, WEP-L 
and assimilated actual evapotranspiration for the available day. 
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Std. 


SEBS 


WEP-L without DA 
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date 
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Mean 2.032 0.785 1.962 0.908 1.424 2.003 0.544 0.576 



Figure 10(a) and (b) show an example about comparison of ETa distribution in September 17, 2005 
as calculated by the WEP-L model without and with data assimilation. As can be seen in these figures, 
the assimilation procedure results contain more spatial variability than the WEP-L derived results 
without data assimilation. Figure 11(a) and (b) show the spatially distributed difference in annual 
evapotranspiration as calculated by the WEP-L model without and with data assimilation at the HRU 
and WRA3 level, respectively. Negative values in Figl la and b mean that the actual evapotranspiration 
calculated by the model with data assimilation is lower than the actual evapotranspiration calculated 
without data assimilation. Figure 11(a) implies that SEBS estimates with limited daily remote sensing 
images yet clear spatial patterns appear in the assimilated ET distribution at the HRU level. The spatial 
differences of 1 1752 HRUs range from -48 to 32mm, and the largest relative difference can reach 16%. 
Figure 11(b) indicates that at the WRA3 level, the DA technique does not bring an obvious change to 
the yearly ET. Besides the WRA3-4 and WRA3-5, the spatial differences of most WRA3s range from - 
10 to 10mm, and the largest relative difference is less than +1.5%. The spatial differences at the 
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WRA3-4 and WRA3-5 are 16.6 and 13.6mm, and the relative differences are 4.2% and 3.7%, 
respectively. 

Figure 10. Comparison of ETa distribution in September 17, 2005 as calculated by the 
WEP-L model without and with data assimilation: (a) WEP-L ET; and (b) Assimilated 
ET. 




Above analysis show that the assimilated ET does not differ much from the WEP-L ET at the 
WRA3 level, whereas the assimilated system makes more obvious spatial difference at the HRU-level. 
It is suggested that data assimilation system could at least remain its original accuracies of the WEP-L 
model at the higher scale, whereas more detailed description for spatial water balance can be given at 
the lower scale. Can we interpret these results? One possible reason is that the lack of detail in the 
database used to parameterize the hydrological states/fluxes is very likely to contribute to the 
incompatibility of the spatially distributed hydrological parameterization of the WEP-L model. 
Because although the whole basin is divided into 11752 hydrological response units in the model 
formulation, some parameters are given at a higher scale due to limited spatial information. This means 
that remote sensing observations have a potentially important role in providing spatial information to 
the assimilation system for the spatially optical hydrological parameterization of the model. 
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Figure 11. Spatially distributed difference in yearly evapotranspiration [mm] as 
calculated by the WEP-L model without and with data assimilation at the HRU level 
and WRA3 level, respectively: (a) HRU-level; and (b) WRA3 level. Negative values 
mean that the actual evapotranspiration calculated by the model with data assimilation is 
lower than that without data assimilation. 




5. Conclusion and remarks 

The paper has demonstrated the potential of remote sensing observations for updating the spatial 
water balance of distributed hydrological model. We have used an extended Kalman filter (EKF) as our 
data assimilation algorithm to handle this type of spatial information in the WEP-L model. By means 
of this data assimilation technique, estimates of daily evapotranspiration derived from MODIS-L1B 
products using the SEBS algorithm are combined with a distributed hydrological model (WEP-L) to 
understand the spatial distribution of water balance in a large basin with an area of 317,800 km in 
China. Assimilation results indicate that remote sensing observations have a potentially important role 
in providing spatial information to the assimilation system for the spatially optical hydrological 
parameterization of the model. This is especially important for large basins, such as the Haihe River 
basin in this study. 

Both WEP-L and SEBS used in this study have robust physical mechanisms and their respective 
features. The WEP-L model is based on both water and energy balance and can provide temporally 
continuous hydrological simulation, whereas the SEBS algorithm is based on surface energy balance 
and can give more spatially variable information on surface energy fluxes. Assimilation of a 
combination of distributed hydrological model with available remote sensing data may provide the best 
spatial and temporal characteristics for hydrological states/fluxes. Therefore, combining and integrating 



Sensors 2008, 8 



4461 



the capabilities of and information from model simulation and remote sensing techniques would be 
both appealing and necessary for improving our knowledge of fundamental hydrological processes and 
for addressing important water resource management problems. 

In the data assimilation experiment reported in this study, the benefits on the prediction precision of 
the model by assimilating MODIS-based ET need be further investigated, although RMSE results show 
data assimilation have a positive impact on the model. Unlike "synthetic" data assimilation 
experiments, there are no assumed "true" hydrologic states or fluxes for the DA system in this study. 
So it is difficult to evaluate to what extent the WEP-L prediction accuracy for the water balance can be 
improved by the integration of MODIS-based ET. So a key question, should be further investigated, is 
whether an energy balance model (such as SEBS) can provide better ET estimates than a distributed 
hydrological model (e.g. WEP-L). 

A second key issue is whether more reliable spatial information can be achieved from remote 
sensing observations. For example, in this study only 20 cloud free MODIS images in the year 2005 
can be attained for evapotranspiration retrieval. It remains challenging to retrieve energy fluxes under 
cloud cover. Another challenge is that the performance of the assimilation system for the combination 
of parameters optimization and state estimation should be further tested for the purpose of operational 
application. 
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